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Mission: Providing world-class computational resources and specialized services for 


the most computationally intensive global challenges for researchers around the world. 
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, Cray Shasta 
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| IBM 200 PF 4 AMD GPUs, 
Jaguar Cray XK6 27 PF 6 NVIDIA GPUs, 1 AMD CPU 

NVIDIA GPU, 2 Power CPUs 29 MW 

o e AMD CPU 13 MW 
9 MW 
2009 7 MW 2012 2017 2021 
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Four Key Challenges to Reach Exascale 


What is so special about Exascale vs. Petascale? 
In 2009 there was serious concern that Exascale Systems may not be possible 


Parallelism: Exascale computers will have billion-way parallelism (also termed 
concurrency). Are there more than a handful of applications that could utilize this? 


Data Movement: Memory wall continues to grow higher - Moving data from the 
memory into the processors and out to storage is the main bottleneck to performance. 


Reliability: Failures will happen faster than you can checkpoint a job. Exascale 
computers will need to dynamically adapt to a constant stream of transient and 
permanent failures of components. 


Energy Consumption: Research papers in 2009 predicted that a 1 Exaflop system 
would consume between 150-500 MW. Vendors were given the ambitious goal of 
trying to get this down to 20 MW. 


Exascale research efforts were started to address these challenges 
After Several False Starts 
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Exascale False Starts: Who Remembers 


e Nexus / Plexus 
e SPEC / ABLE 


¢ Association Model 


We finally got traction with: 


e CORAL 


e Exascale Computing Project 


Supercomputer Specialization vs ORNL Summit 


e As supercomputers got larger and larger, we expected them to 
be more specialized and limited to just a small number of 
applications that can exploit their growing scale 


e Summit’s architecture with powerful, multiple-GPU nodes with 
huge memory per node seems to have stumbled into a design 
that has broad capability across: 


— Traditional HPC modeling and simulation 
— High performance data analytics 
— Artificial Intelligence 


ORNL Pre-Exascale System -- Summit 


System Performance 


e Peak of 200 Petaflops (FP¢,) 
for modeling & simulation 


e Peak of 3.3 ExaOps (FP 4.) 
for data analytics and 
artificial intelligence 
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The system includes 


4608 nodes 


Dual-rail Mellanox EDR 
InfiniBand network 


250 PB IBM file system 
transferring data at 2.5 TB/s 


Each node has 


2 IBM POWERS processors 
6 NVIDIA Tesla V100 GPUs 


608 GB of fast memory 
(96 GB HBM2 + 512 GB DDR4) 


1.6 TB of NVM memory 


Multi-GPU nodes Excel Across Simulation, Analytics, Al 


Advanced eo ere ee Artificial 


simulations Glee cae intelligence 


e Data analytics — CoMet bioinformatics application for comparative genomics. Used to find sets 
of genes that are related to a trait or disease in a population. Exploits cuBLAS and Volta tensor 
cores to solve this problem 5 orders of magnitude faster than previous state-of-art code. 


- Has achieved 2.36 ExaOps mixed precision (FP4¢-FP32) on Summit 
e Deep Learning — global climate simulations use a half-precision version of the DeepLabv3+ 
neural network to learn to detect extreme weather patterns in the output 
- Has achieved a sustained throughput of 1.0 ExaOps (FP16) on Summit 
¢ Nonlinear dynamic low-order unstructured finite-element solver accelerated using mixed 
precision (FP4s thru FP.4) and Al generated preconditioner. Answer in FP¢q4 
- Has achieved 25.3 fold speedup on Japan earthquake - city structures simulation 


e Half-dozen Early Science codes are reporting >25x speedup on Summit vs. Titan 
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Multi-GPU Nodes Excel in Performance, Data, and Energy Efficiency 
Summit achieved #1 on TOP500, #1 on HPCG, and #1 Green500 


500 CERTIFICATE 


The List. 


Summit, an IBM Power System AC922 at the 
U.S. Department of Energy / SC / Oak Ridge National Laboratory, TN, USA 


HPCG 


JUNE 2018 


PRESENTED AT 


is ranked 
AA _ 


among the World's TOP500 Supercomputers 
with 122.3 PFlop/s Linpack Performance 
on the TOP500 List published at ISC High Performance, June 25, 2018 


GREEN 
50Q CERTIFICATE 


Summit, an IBM Power CPU + NVIDIA Volta GV100 GPU System at 
DOE/SC/Oak Ridge National Laboratory in the United States 


Congratulations from the TOP500 Editors 


SYSTEM ACHIEVED 
NUMBER) Summit 2.9 

DOE/SC/ORNL 

va Pflop/s 


is ranked amongst Level 3-measured systems as 
University of Tennessee 


a oe Alot tGlunp Pick dune -——— No.1inthe Green500 ——— 
— ents a among the World’s TOP500 Supercomputers 
T rin with 13.889 GFlops/Watt Linpack Efficiency 


on the Green5S00 List published at ISC High Performance, June 25, 2018 


122 PF HPL 
Shows DP performance 


Congratulations from the Green500 Editors 


2.9 PF HPCG ne -~ 
Shows fast data movement 
13.889 GF/W 
Shows energy efficiency 
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Frontier Continues the Accelerated Node Design 
begun with Titan and continued with Summit 


Partnership between ORNL, Cray, and AMD 
The Frontier system will be delivered in 2021 


Peak Performance greater than 1.5 EF 


Composed of more than 100 Cray Shasta cabinets 


Connected by Slingshot™ interconnect with adaptive routing, congestion control, and quality of service 


Accelerated Node Architecture: 
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One purpose-built AMD EPYC™ processor 

Four HPC and Al optimized Radeon Instinct™ GPU accelerators 
Fully connected with high speed AMD Infinity Fabric links 
Coherent memory across the node 

100 GB/s node injection bandwidth 

On-node NVM storage 
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Comparison of Titan, Summit, and Frontier Systems 


a a 
Specs 


Peak 21 PE 200 PF S IFSEE 
# cabinets 200 256 > 100 
Node 1 AMD Opteron CPU 2 IBM POWER9™ CPUs 1 AMD EPYC CPU 

1 NVIDIA Kepler GPU 6 NVIDIA Volta GPUs 4 AMD Radeon Instinct GPUs 
On-node PCI Gen2 NVIDIA NVLINK AMD Infinity Fabric 
interconnect No coherence Coherent memory Coherent memory 

across the node across the node across the node 
System Cray Gemini network Mellanox dual-port EDR IB network Cray four-port Slingshot network 
Interconnect 6.4 GB/s 25 GB/s 100 GB/s 
Topology 3D Torus Non-blocking Fat Tree Dragonfly 
Stora 32 PB, 1 TB/s, Lustre 250 PB, 2.5 TB/s, IBM Spectrum 4x performance and 3x capacity 

g Filesystem Scale™ with GPFS™ of Summit’s I/O subsystem. 

On-node 
NVM No Yes Yes 
Power 9 MV 13 MV 29 MV 
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Moving Applications from Titan and Summit to Frontier 


ORNL, Cray, and AMD are partnering to co-design and develop enhanced GPU 
programming tools. 


— These new capabilities in the Cray Programming Environment and AMD’s ROCm 
open compute platform will be integrated into the Cray Shasta software stack. 


HIP (Heterogeneous-compute Interface for Portability) is an API developed by AMD 
that allows developers to write portable code to run on AMD or NVIDIA GPUs. 


- The API is very similar to CUDA so transitioning existing codes from CUDA to HIP is 
fairly straightforward 


— OLCF has HIP available on Summit so that users can begin using it prior to its 
availability on Frontier 


In addition, Frontier will support many of the same compilers, programming models, 
and tools that have been available to OLCF users on both the Titan and Summit 
supercomputers 
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Solutions to the Four Exascale Challenges 
How Frontier addresses the challenges 


Parallelism: The GPUs hide between 1,000 and 10,000 way concurrency inside their 
pipelines so the users don’t have to think about as much parallelism. Summit has 
shown the multi-GPU node design can do well in simulation, data, and learning. 


Data Movement: Having High Bandwidth memory soldered onto the GPU increases 
BW an order of magnitude and GPUs are well suited for latency hiding. 


Reliability: Having on-node NVM (Non-Volatile Memory) reduces checkpoint times 
from minutes to seconds. Cray adaptive network and system software aid in keeping 
system up despite component failures. 


Energy Consumption: Frontier is projected to use less than 20 MW per 1 Exaflop — 
due in part to the 10 years of DOE investment in vendors for Exascale technologies. 
(FastForward, Design Forward, Pathforward) 
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Questions? 


ORNL / Cray / AMD Partnership 


(0) G 
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